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Abstract. We present a method to discriminate instanton-induced processes from standard DIS background based on 
Range Searching. This method offers fast and automatic scanning of a large number of variables for a combination of 
variables giving high signal to background ratio and the smallest theoretical and experimental uncertainties. 



INSTANTON-INDUCED PROCESSES 

Instantons (1) are a fundamental non-perturbative as- 
pect of QCD, inducing hard processes that are absent in 
perturbation theory. The expected cross section as cal- 
culated in "instanton-perturbation-theory" is sufficiently 
large (2, 3, 4) to make an experimental discovery possi- 
ble (5, 6). For a more detailed introduction to instantons 
(/) see e.g. (7). 

We study the prospect of a search for /-induced events 
modelled by the Monte Carlo Generator QCDINS (8) 
which generates /-induced events in deep-inelastic ep- 
scattering where a quark emerging from a ^-splitting of 
the exchanged photon fuses with a gluon emitted from 
the proton. In the /-induced process gg-pairs of each of 
the three light quark flavours and on average 2-3 gluons 
are produced. In the hadronic CMS they form a band (of 
about two units in pseudo-rapidity) of particles with high 
transverse energy which are homogeneously distributed 
in azimuth. Since in every event a pair of strange quarks 
is produced, in this band an increased number of kaons 
compared to standard DIS events is expected. Finally, the 
quark out of the split photon not participating in the in- 
stanton subprocess forms a hard jet. 

The predicted cross section O^era = 29.2+gjpb (4, 
7) in a kinematic region where "instanton-perturbation- 
theory" (x B > 1(T 3 , 0.1 <y < 0.9, Q 2 > 113 GeV 2 ) is ap- 
plicable, is two orders of magnitude smaller than the DIS 
cross section Odis ~ 3000pb. Therefore the highest pos- 
sible signal to background ratio has to be achieved by ex- 
ploiting observables characterising /-induced processes. 
To find these observables a large number of promising 
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event variables have to be investigated and the sensitivity 
to systematic details in the modelling of the hadronic final 
state has to be tested. This requires a sophisticated and 
fast discrimination method to find the appropriate combi- 
nation of event variables. 



RANGE SEARCHING 

Events can be classified as signal or background by es- 
timating the probability density p of both these classes at 
the point of the event in the event-variable phase space, 
employing a Monte Carlo (MC) generator to sample the 
densities. In the case of neural networks (NN) this is 
done by fitting the probability-density with the adjusted 
weights of the neurons. To circumvent this time consum- 
ing procedure the density at each point can be directly es- 
timated by counting the number of background and signal 
events in a surrounding box V. Given the ratio 

p(Z) _ #/(V) 
p{DIS) #DIS{V) 

the probability of an event to be a signal event is D = 
£/(l+£). Compared to NN's this method also has the 
advantage of not extrapolating into phase space regions 
where there are no sample events available. Thus signals 
from data events outside the region covered by the MC 
simulation can be avoided. This is not the case for NN's 
which extrapolate into regions where there is no test data. 
Counting the number of events in the vicinity of a certain 
point is a problem known as Range Searching. 

Range Searching algorithms have been developed 
which allow a search time ~ log(n), where n is the num- 
ber of points that have to be searched (9). We employ 
an algorithm (11) suitable especially for a large number 
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FIGURE 1. The characteristic event variables providing good instanton separation with small systematic uncertainties. Shown is the 
reconstructed virtuality of the quark entering the /-subprocess Q® c , the sphericity of the particles in the /-Band in their rest system, the 
second Fox-Wolfram moment of these particles and the event shape variable E ouU b which is the projection of the particle transverse 
energy onto the axis, that makes this quantity maximal (see (6)). Finally the number of charged kaons in the /-Band is shown. 



of events and dimensions (i.e. observables). The MC 
events are successively filled into the nodes of two binary 
trees, one for the signal events and one for the background 
events, where the criterion by which the decision is taken 
to descend to the left or right of a node is given by the 
value of one of the event variables. While descending the 
tree this variable cycles through all the ones considered. 
After filling, the position of every event in the tree is given 
by its coordinates in the event variable space. Classifica- 
tion of an event is done by searching in the trees for all 
background and signal events in the box V. This is done 
in the same manner as filling the tree. 



RESULTS 

Starting with 35 variables based on the hadronic final 
state the best 12 were chosen by calculating the discrim- 
inant with all 2-combinations (pairs) of the initial vari- 
ables and taking those variables which provide a high 
separation power S = e(I)/e(DIS) demanding an effi- 
ciency for instantons of e(7) = 10%. The number of 
considered variables is further reduced by calculating all 
5 -combinations and selecting those with highest separa- 
tion power and a small systematic variation of the back- 
ground. The systematic uncertainty was obtained by us- 
ing four standard DIS-MC simulators (12) which were 



tuned to data on representative hadronic final state quan- 
tities, in the range Q 2 > lOOGeV 2 at HERA (10). The 
variables forming the best combination is shown in Fig- 
ure 1. The separation power for e(I) = 10% is S = 126. 
In Figure 2 the shape normalised discriminant D is shown 
for the /-induced and the background events, as well as 
the distribution for D > 0.9 normalised to a luminosity of 
100pb _1 which is comparable to that already collected by 
each of the HERA experiments HI and ZEUS. An event 
sample can be isolated where half of the events are in- 
stantons while the /-efficiency is still 10%. 

For a search method to be reliable and easy to apply 
it is important to have as few free parameters as possible. 
In the case of Range Searching these are the number of 
events in the search trees, the size of the neighbourhood V 
and the minimum number of events in this neighbourhood 
to classify an event. To reduce the number of parameters 
for the box size the ratios of the box edge lengths were 
fixed by defining a box which contains most of the events 
and letting V be a scaled version of this large box. The 
projections onto these box edges are shown in Figure 1. 
The variation of the result depending on the size of V 
is shown in Figure 3. Clearly the separation increases for 
smaller boxes with the number of events that populate the 
search trees, while for larger boxes this difference van- 
ishes. The plateau is increasing in width with the number 
of tree events and reaches nearly an order of magnitude 
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FIGURE 2. To the left, the shape normalised discriminant D for the instanton events using QCDINS and for standard DIS events 
using four MC simulators is shown. The second plot shows a zoom into the rightmost part and is now normalised to a luminosity of 
L = lOOphr 1 . At e(/) = 10% 178 /-induced events are expected. 
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FIGURE 3. The separation power S at e(/) = 10% for different 
box sizes and different numbers of events in the search trees. 



CONCLUSIONS 

The multivariate discrimination method based on 
Range Searching performs at least as good as a NN's 
when applied to the search for instantons at HERA. It is 
much less time consuming and can be easily used to au- 
tomatically scan a large number of appropriate variables. 
The short processing time allows extensive searches for 
the best discriminating variables taking systematic effects 
into account. In a region where /-perturbation theory 
can be safely employed this novel discrimination method 
results in an 50% /-enriched data sample while the I- 
efficiency is still 10%. 
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4h compared to 20min for the Range Search method on a Linux PC 
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